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ABSTRACT 



In the case of constituting a processing unit having the 
characteristic of a VLIW type processing unit and the 
characteristic of a pipeline type processing unit, since ref- 
erence to the result of operations is made among a plurality 
of processing units executing in parallel the operations, 
transfer of the register file is frequently generated among the 
processing units, resulting in insufficient effect of the high 
speed operations. In view of solving this problem, the 
predicate registers are provided and moreover a means for 
broadcasting the update data of the predicate register to all 
processing units is also provided. Thereby, operations for 
obtaining branching condition and numerical value can be 
realized in different processing units and the number of steps 
of the processing program can be reduced. In addition, since 
high speed transfer between the processing units of the data 
register having a wider bit width is no longer required and 
thereby the mounting area can be reduced and high speed 
processing unit can be realized. 

13 Claims, 9 Drawing Sheets 
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001 


if ( a > 2 ) 


002 


{ 


003 


a = a - 1; 


004 


} 


005 


b = b + a; 


FIG. 2 



001 


cmp 


r1, 


#2 


002 


ble 


$1 




003 


sub 


M, 


#1, M 


004 


$1: add 


r2, 


r1, r2 



FIG. 3 



001 j cmp.gtr1,#2, pO 

002 j (pO) sub M,#1, M 

003 ! add r2, r1,r2 
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e = c + d; 

g = (a + b) * f; 

if( e>9) 

{ 

g++; 

e = e + 2; 

} 



FIG. 7 
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501 

s 

add r1, r2, r7 

mul r7, r6, r7 

cmp. gt r5, #9, pO 

(pO) add r7,#1,r7 



\ 

502 

i__ 

add r3, r4, r5 
xfer r5 
cmp. gt r5, #9, pO 
(pO) add r5, #2, r5 



FIG. 8 
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512 
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add 


r1,r2, r7 






add 


r3, r4, r5 


mul 
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cmp.gt 


r5, #9,pO,B 


( pO ) add 
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(PO) 


add 
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FIG. 9 



switch (a) 
{ 

case 0: d = b + c; 

9 = a + f; 

break; 
easel: d = b-c; 

9 = e/f; 

break; 

} 



FIG. 10 
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VLIW SYSTEM WITH PREDICATED 

INSTRUCTION EXECUTION FOR 
INDIVIDUAL INSTRUCTION FIELDS 

BACKGROUND OF THE INVENTION s 

The present invention relates to a processing unit having 
a plurality of instruction fields in one instruction word 
register to execute in parallel these instructions. 

A traditional processing unit has generally been structured 10 
to execute one processing with one instruction word and to 
execute in series processing one by one for the stream of 
instruction words. 

A processing unit developed in recent years, on the other 
hand, has an instruction system which can process a plurality 15 
of instructions with only one instruction word and execute in 
parallel these instructions in order to improve the execution 
speed. This processing unit is generally called as a VLIW 
(Very Long Instruction Word) type processing unit. 

The processing unit of this type comprises a plurality of 20 
processing units to execute in parallel a plurality of instruc- 
tions. Moreover, this processing unit has a plurality of 
register files corresponding to a plurality of processing units 
to allow respective processing units to independently 
execute the processing. In the case of executing the partial- 25 
lar processing using such a plurality of processing units, data 
communication between processing units is generally indis- 
pensable. For this purpose, therefore, the processing unit of 
this type has, for example, a means for transferring register 
value between a plurality of processing units or a means 30 
such as a common register which can be accessed from a 
plurality of processing unit. As this processing unit, for 
example, the technique is disclosed in the Japanese Patent 
Application Laid-Open No. 5-233281. 

In addition to such means for realizing high speed execu- 35 
tion explained above, there is provided a processing unit in 
which the processing itself is divided in time series into a 
plurality of stages and a plurality of independent stages 
execute the processing in series. These processing units are 
called pipeline type processing units. 40 

It is known that the processing units of this type are 
capable of showing the maximum performance when the 
instruction words are arranged in series. Meanwhile, in the 
case of processing where the instruction words are not 45 
arranged in series and, for example, condition branching 
instructions are included, pipeline control is disturbed and 
tentative deterioration of performance is generated. 

In view of overcoming such problems, the processing unit 
of this type has been modified to reduce the conditional 50 
branching processes. A typical method is use of a predicate 
register. 

The predicate register is a register to modify the instruc- 
tion words to determine whether the relevant instruction 
words are executed or not. Use of the predicate register 55 
enables remarkable reduction of the frequency in use of the 
condition branching instructions. For understanding of the 
present invention described later, this performance will be 
briefly explained with reference to the drawings. 

FIG. 2 shows an example of the program using the C 60 
language. FIG. 3 shows an example where the program of 
FIG. 2 is compiled into the format to be applied to the 
processing unit of the related art. FIG. 4 shows an example 
where the program of FIG. 2 is compiled into the format to 
be applied to the processing unit using a predicate register. 65 
As shown in these figures, the arithmetic or logical processes 
realized by the condition branching in FIG. 3 can be realized 
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in FIG. 4 without requiring the condition branching process. 
The second line in FIG. 4 describes an instruction word 
using the predicate register. With the comparison instruction 
of the first line, a value of the comparison result is written 
into the first predicate register (pO). The subtraction instruc- 
tion of the second line is executed only when the value 
stored in pO is "true" depending on the description "(pO)" 
preceding the instruction word. If a value stored in pO is 
"false", although the subtraction instruction of the second 
line is read, then the subtraction process is not executed. 
With such executing method, the condition branching pro- 
cesses can be reduced. 

However, when the means to realize such high speed 
processing explained above is used in combination, namely 
when the processing unit having the characteristics of the 
VLIW type processing unit and characteristic of the pipeline 
type processing unit is structured, there are following prob- 
lems. 

Since references are executed for processing results with 
each other between a plurality of processing units which are 
executing in parallel the processes, transfer processes of 
register file are frequently generated between processing 
units and the sufficient high speed operation effect owing to 
the parallel processes or pipeline processes cannot be 
obtained in some cases. 

In addition, the high speed processes cannot be realized 
because the number of program steps for the transfer process 
is increased. 

OBJECT AND SUMMARY OF THE INVENTION 

It is therefore an object of the present invention to provide 
a processing unit which permits a plurality of processing 
units to execute in parallel the processes utilizing the opera- 
tion results with each other so that the processing units can 
save with each other the time required to reflect the opera- 
tion result and enable high speed processes. 

Moreover, it is another object of the present invention to 
provide a processing unit which permits a plurality of 
processing units to execute in parallel the processes utilizing 
the operation results with each other so that the number of 
steps of the processing program can be reduced. 

It is further object of the present invention to provide a 
processing unit which permits a plurality of processing units 
to execute in parallel the processes utilizing the operation 
results with each other so that the processing units can save 
the time required to reflect the operation result with each 
other, generate an instruction format for reducing the num- 
ber of steps of the processing program and executes such 
instruction. 

According to an aspect of the present invention, there is 
provided a processing apparatus having a plurality of pro- 
cessing circuits. Each processing circuit includes a sending 
circuit which sends information based on a result of an 
instruction in one of the processing circuits to at least one 
other processing circuit during executing the instruction, and 
an ALU for executing instructions considering the informa- 
tion sent from the one processing circuit. 

According to an aspect of the present invention, there is 
provided a compiler software, stored on a storage medium, 
for generating instruction strings used in a processing appa- 
ratus having a plurality of processing circuits. The compiler 
software when executed by a computer causes said computer 
to perform the steps of arranging instructions into instruc- 
tions strings to be executed by the processing circuits in 
parallel, and generating for each instruction an instruction 
field format used in one of the processing circuits. The 
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instruction field format having a send field for sending the units executes in parallel the operations using each other the 

result of the instruction field to at least one other processing results of operations in view of executing an instruction 

circuit. format which can realize high speed processing and reduc- 

According to an another aspect of the present invention, ing the number of steps of processing program, 
there is provided a method for executing instructions in a 5 With the advantages describe above, the present invention 
processing apparatus having a plurality of processing provides the effect that actual performances of the process- 
circuits, including a step for sending a result of an instruc- ing unit as a whole can be improved remarkably, 
lion in one of the processing circuits to at least one other 

processing circuit during the; instruction executed in the one BRIEF DESCRIPTION OF THE DRAWINGS 

processing circuit, and a step for executing instructions in an 10 ^ presem mvenlion ^ 5c more apparent from the 

ALU in the at least one other processing circuit considering following detailed description, when taken in conjunction 

the result sent from the one processing circuit. ^ me accompanying drawings, in which: 

According to yet another aspect of the present invention, 0ther objects md advamages of ^ present invention will 

there is provided an information processing system includ- be &pp ^ n[ torn the following detailed description of the 

ing a processing apparatus having a plurality of processing prese ntly referred embodiments thereof, which description 

circuits. Each processing circuit includes a sending circuit should be considered in conjunction with the accompanying 

which sends a result of an instruction in one of the process- drawings in which* 

ing circuits to at least one other processing circuit during CTr , u • . 

& . AFTif *■ • * FIG. 1 is a block diagram showing a schematic structure 

executing the instruction, and an ALU for executing instruc- r _^ c - _ ^ .° . r t . _ . 

. . . . f 4 , s in ota processing unit as the other embodiment or the present 

nons considenng the result sent from the at least one other 20 . , 

• ■ j e * • • * mvention; 

processing circuit, and a memory tor storing an instruction „ . , , 

for the processing apparatus. The instruction includes a field FIG ; 2 15 ^ explanatory diagram showing a program 

for reporting the result of the instruction executed by the one exam P le usm S C language; 

processing circuit to the at least one other processing circuit. FIG 3 is an explanatory diagram showing an example 

According to a further aspect of the present invention, 25 ^m ? M to the format applied to the processing unit of the 

there is provided a processing apparatus for executing a related art, 

plurality of instruction fields in one instruction word in FIG - 4 is an explanatory diagram showing an example 

parallel, including a plurality of registers provided corre- compiled to the format applied to the predicate register type 

sponding to a plurality of instruction field groups, each processing unit; 

instruction field group including the at least one instruction 30 FIG. 5 is a block diagram showing the schematic structure 

field, a circuit for storing values to a plurality of registers of a processing unit of a preferred embodiment of the present 

based on the result of an operation, and an operation circuit invention; 

for selecting whether an operation should be executed or not FIG. 6 is an explanatory diagram showing instruction 

based on an evaluation of the values stored in the plurality words used in the preferred embodiment of the present 

of registers. invention; 

According to a further aspect of the present invention, FIG. 7 is an explanatory diagram showing a program 

there is provided a processing apparatus for executing a example using the C language; 

plurality of instruction fields in one instruction word in piG. 8 is an explanatory diagram showing an example of 

parallel, including a plurality of operation units provided 4Q the instruction word string compiled to the format applied to 

corresponding to at least one instruction field among the the predicate register type processing unit; 

plurality of instruction fields, each operation unit further FIG. 9 is an explanatory diagram showing an example of 

comprising: an mstrucUon register for holding said at least the instruction word string compile d to the format applied to 

one instruction field; an operation circuit for executing an the predicate register t y pe processing unit of the preferred 

operation corresponding to said at least one instruction field, 45 em bodiment; 

and a register for storing a value to determine execution or ™^ . * , 

*• r iL r j 4* j FIG. 10 is an explanatory diagram showing a program 

not execution of the operation of said operation circuit, and . . £ « j &• & r & 

,. . ** j* • * \- example using the C language; 

said operation circuit determines execution or not execution r & & & » 

of said instruction depending on the value written into said FIG - 11 is an explanatory diagram showing an example of 

register ^ e ins truction word string compiled to the format applied to 

According to each structure explained above, the number 5 ° me P redicate rc « istcr ^ processing unit; 
of times of the transfer process of the operation result FIG. 12 is an explanatory diagram showing an example of 
between a plurality of processing units to execute a plurality the instruction word string compiled to the format applied to 
of operations can be reduced and thereby the number of ^e predicate register type processing unit having the broad- 
steps of the processing program can also be reduced. 55 casuQ g function; 

Therefore, according to the present invention, there is FIG. 13 is a block diagram showing a schematic structure 

provided a processing unit in which a plurality of processing of a predicate register updating means; 

units execute in parallel the operations using with each other FIG. 14 is a block diagram showing a schematic structure 

the results of operations in view of enabling the high speed of the processing unit as the other embodiment of the present 

operations. 60 i nven tion; and 

In addition, according to the present invention, there is FIG. 15 is an explanatory diagram showing an example of 

provided a processing unit in which a plurality of processing the instruction word string compiled to the format applied to 

units execute in parallel the operations using with each other the predicate register type processing unit which can broad- 

the results of operations in view of reducing the number of cast a read value; and 

steps of the processing program. 6 5 FIG. 16 is a block diagram showing a schematic structure 

Further, according to the present invention, there is pro- of the information processing system as the application of 

vided a processing unit in which a plurality of processing the present invention. 
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DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

The preferred embodiments of the present invention will 
be explained in detail with reference to the accompanying 
drawings. 

FIG. 5 shows a schematic structure of a processing unit of 
a preferred embodiment of the present invention. In FIG. 5, 
the reference numerals 1 and 2 designate processing units; 3, 
an instruction cache; 4, a data cache. Although not illustrated 
in the figure, the processing unit, instruction cache and data 
cache of the present invention are desirable to be accom- 
modated in one chip LSI. In addition to the elements shown 
in the figure, the one chip LSI also comprises the structural 
elements such as input/output sections but such structural 
elements are omitted here because these are not essentially 
related to the subject matter of the present invention and can 
be formed with the well known arts. 

As shown in FIG. 6, an instruction word used in this 
embodiment is composed of an instruction 1 field and an 
instruction 2 field. The instruction 1 field and the instruction 
2 field are respectively composed of the predicate register 
selection field (P field), predicate register broadcasting field 
(B field) and instruction field 0 field). In FIG. 6, 601 
designates an instruction word; 611, an instruction 1 field; 
612, an instruction 2 field; 621 and 631, P field; 622 and 632, 
B field; 623 and 633, I field, respectively. Among these 
fields, the instruction 1 field is supplied to a processing unit 
1 and the instruction 2 field, to a processing unit 2. 

The format of the instruction word indicated here is only 
an example and it may be implemented by modifying the 
structure within the scope of the appended claims. For 
example, the instruction field is not always required to be 
two fields but it may also be changed to more fields, for 
example, four fields. 

Returning to FIG. 5, a processing unit 1 is formed of the 
following structural elements. 101 designates an instruction 
register which holds the instruction 1 field regarding the 
processing unit 1 among the instructions read from the 
instruction cache 3. 102, a data register which holds an 
operand data for executing processing in the processing unit 
1. 103, an ALU which executes the instruction designated by 
the instruction 1 field. 105, a predicate register which holds 
the information indicating that ALU 103 executes the 
instruction or not. 106, a selector for selecting an input data 
at the time of updating a value of the predicate register 105. 

The processing unit 2 is formed of the same structural 
elements as that of the processing unit 1. Namely, the 
processing unit 2 comprises an instruction register 201, a 
data register 202, an ALU 203, a predicate register 205 and 
a selector 206. 

Here, the data registers 102, 202, predicate registers 105 
and 205 respectively have a plurality of register regions for 
selective use. For example, 32 data register regions are 
provided, five data register selection signals are input to the 
data registers 102 and 202 in order to select only one data 
register region from such regions. In this case, each data 
register region can be identified in general by assigning the 
numbers, for example, rO, rl, r2 . . . to the data register 
regions. This description can be applied not only to the data 
register but also to the predicate register. 

Next, operations of this embodiment will then be 
explained, operations of an ordinary ALU other than that 
described here can be realized by the technique of the related 
art and the detailed explanation thereof will be omitted here. 

First, the existing operations for executing the instruction 
string shown in FIG. 4 will be explained. FIG. 4 shows an 
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example of the instruction string consisting of three instruc- 
tion words which can be processed by only one processing 
unit. 

The first instruction word is a comparison instruction 

5 (cmp.gt) which compares the data register region 1 (rl) and 
immediate value "2" to write "true" to the predicate register 
region 0 (p0) when the data register region 1 is larger but to 
write "false 1 ' in other cases. Here, the predicate register 
region to which a value is written among a plurality of 

io predicate register regions included in the predicate register 
105 is designated using a part of the predicate register 
selection field (P field) of the instruction 1 field. Moreover, 
the predicate register region for writing the value can also be 
designated, for example, by using a part of the instruction 

is field (I field). 

The second instruction word is a subtraction instruction 
(sub). In this instruction, the immediate value "1" is sub- 
tracted from rl and the result is written back to rl. However, 
since this instruction word is located after a modification 

20 word "(p0)'\ if pO is "true", this subtraction is executed but 
if p0 is "false", this subtraction is not executed. 

The third instruction word is an addition instruction (add). 
When this instruction is issued, the data register region 2 (r2) 
and rl are added and the result is written back to r2. 

25 

Next, this embodiment will be explained with reference to 
FIG. 7, FIG. 8 and FIG. 9. FIG. 7 shows another example of 
program using the C language, while FIG. 8 shows an 
example of the instruction word string compiled from the 

30 program of FIG. 7 into the format applied to the predicate 
register type processing unit of the related art. In FIG. 8, 501 
designates an instruction word string applied to the process- 
ing unit 1; 502, an instruction word string applied to the 
processing unit 2. Here, it is assumed to simplify the 

35 explanation that the processing units 1 and 2 processes the 
instruction words in the same processing rate. 

First, the processing unit 2 adds r3 and r4 depending on 
the first instruction word (add) and stores the result into r5. 
Simultaneously, the processing unit 1 adds rl and r2 depend- 

40 ing on the first instruction word (add) and stores the result 
into r7. 

In the following steps, the processing unit 1 executes 
multiplication of r7 and r6 depending on the second instruc- 
tion word (mul) and stores the result into r7. Simultaneously, 

45 the processing unit 2 transfers the value of data register 
depending on the second instruction word (xfer). In this 
example, the value held in the data register region 5 (r5) 
included in the data register 202 is written into the data 
register region 5 (r5) included in the data register 102. 

50 In the next step, the processing units 1 and 2 execute a 
comparison instruction depending on the third instruction 
word (cmp.gt) and respectively stores the comparison result, 
that is the value evaluated the result of processing, into the 
predicate register region 0 (pO). 

55 In the next step, moreover, when the value of pO is "true", 
the processing unit 1 adds r7 and immediate value 1 depend- 
ing on the fourth instruction word (add) and stores the result 
into r7. Simultaneously, the processing unit 2 adds, when the 
value of pO is "true", r5 and immediate value 2 depending 

60 on the fourth instruction word (add) and stores the result into 
r5. 

Next, FIG. 9 shows an example of an instruction word 
string compiled from the program shown in FIG. 7 into the 
format applied to the predicate register type processing unit 
65 of this embodiment. In FIG. 9, 511 designates an instruction 
word string applied to the processing unit 1; 512, an instruc- 
tion word string applied to the processing unit 2. 



03/19/2004, EAST Version: 1.4.1 



6,041,399 

7 8 

This embodiment is characterized by the B parameter shows a schematic structure of the processing unit like FIG. 

which is additionally provided to the second instruction 5 and the processing unit should desirably be formed of one 

word (cmp.gt) of the instruction word string 512. This chip LSI. The embodiment of FIG. 1 is mainly different in 

parameter corresponds to me predicate register broadcasting mc structure from the embodiment shown in FIG. 5. 

field (B field) in the instruction field and determines whether 5 Namely, different number of ALU is comprised in one 

broadcast should be made to the other processing units or not processing unit (one in FIG. 5 and two in FIG. 1) and 

for the updated predicate register. Of course the broadcast broadcasting of the write data to the predicate register is 

operation can be conducted where the result is cxecutcd vja ^ bus 100 m lacc of ^ sclcctor In folbw . 

communicated, sent or reported to one or more of .IheALUs explanation, difference in structure or operation from 

When this second instruction word is executed, by the end , c \, c . . . M1 , , . K 
f t . i r j j • ♦ *• j .10 that of the first embodiment will be explained, 

of the execution cycle of said second instruction word, not t ... 

only the value held in the predicate register region 0 (pO) of In ^ embodiment, the instruction word is composed of 

the processing unit 2 is updated to the value based on the instruction 1 field, an instruction 2 field, an instruction 3 

result of the second instruction word but also the value held field and an instruction 4 field. The structural element of 

in the predicate register region 0 (pO) of the other processing each instruction field is same as that in the first embodiment, 

unit, the processing unit 1 in this example is updated by the 15 Of these instruction fields, the instruction 1 field and the 

same way. This operation is realized by the selectors 106 and instruction 2 field are supplied to the processing unit 1 as the 

206 (FIG. 5) of the processing units 1 and 2. first instruction field group, while the instruction 3 field and 

The B parameter of the instruction word string 512 the instruction field 4 to the processing unit 2 as the second 

controls both selector 106 of the processing units 1 and instruction field group. Since each processing unit comprises 

selector 206 of the processing units 2. 20 two units of ALU, the instruction registers 101 and 201 are 

A compiler generating these iastruction word strings 511 structured to simultaneously store two instruction fields, 

and 512 does not generate instruction words 511 and 512 The values output from ALU for writing into the predicate 

which designate B parameters simultaneously register are respectively written into the predicate register in 

In the case of the ordinary operation, the value of the the processing unit and can also be output to the bus used for 

predicate register broadcasting field (B field) is "false". In 25 broadcasting to the other predicate registers. The values 

this case, the selector selects the write data to the predicate output from each ALU for writing into the predicate register 

register generated by ALU in the same processing unit and are simultaneously broadcasted to the designated predicate 

stores this data into the predicate register. However, when register regions of all processing units via such broadcasting 

the value of the predicate register broadcasting field (B field) 3Q bus and are then stored therein. 

is detected as "true" by designating the B parameter in the With introduction of such structure, the number of pro- 
instruction words, the selector selects the write data output- cessing units and ALU can be increased under the condition 
ted from the processing unit where the B field is "true" and that increase of delay of signal by multiple input selector and 
stores this data into the predicate register. the mounting area to independently connect many signals 

Unnecessary transfer process of data register can be saved 35 can be minimized, 

by providing a predicate register updating means having the N ext) FIG. 13 shows detail of the predicate register 

characteristic that the broadcasting of update to the other updating means shown in FIG. 1. In FIG. 13, reference 

processing unit is possible. numeral 11 designates a predicate register broadcast medi- 

Subsequently, the other examples of the present invention ating circuit to determine from which ALU the output should 

are shown in FIG. 10, FIG. 11 and FIG. 12. FIG. 10 shows 4Q be broadcasted; 12, a predicate register broadcast selector 

an example of the program using the C language. FIG. 11 for selecting an output determined by the circuit 11; 13, an 

shows an example of the instruction word string compiled exception generating circuit in relation to the broadcasting; 

from the program shown in FIG. 10 into the format applied 14, an exception signal generated from the exception gen- 

to the predicate register type processing unit. FIG. 12 shows erating circuit; 15, an exception processing circuit for 

an example of the instruction word string compiled into the 45 receiving an exception signal 14; 16, an operation stop 

format applied to the predicate register type processing unit signal output from the exception processing circuit 15; 107 

having the broadcasting function from the program shown in and 207, a instruction mask device (AND circuit) to deter- 

FIG. 10. mine whether ALU 103 and 203 should be operated or not 

In this embodiment, the processing unit is based on the based on the values stored in the predicate registers 105 and 

precondition that it has three or more processing units. As 50 205; 108 and 208, predicate register broadcasting data 

will be understood from the figures, as explained above, the supplying means (OR circuit) for supplying the predicate 

instruction word string can be simplified by generating the register writing value when it is broadcasted. Each structural 

instruction word string which can effectively use the broad- element should be formed as one chip LSI. A dotted line in 

casting function. FIG. 13 indicates the circuit connecting ALU and data 

A compiler for the present invention also has the feature. 55 register which are not related in direct to above explanation. 

In compiling the C language instruction strings in FIG. 7, The predicate register broadcast mediating circuit U 

when the compiler detects revising "e" and u g" are inde- determines ALU providing the result of operation to be 

pendent each other, it decides that an operation with "e" and broadcasted on the basis of the value of the B field included 

an operation with "g" are executed in the respective pro- in the instruction registers 101 and 201 and informs it to the 

cessing unit, and when the compiler detects revising "e" and eo predicate register broadcasting selector 12. The determina- 

"g" are depended on the result of a conditional equation, the tion method can be selected freely, but, for example, the 

compiler decides that one processing unit executes the method depending on the fixed priority sequence preset to 

conditional equation and also reports the result to the other the processing unit or the method where an exception is 

processing unit. Therefore the instruction strings shown in generated as a resource collision error for the simultaneous 

FIG. 9 are generated. 65 broadcasting from a plurality of processing units may be 

Another embodiment of the present invention will sub- considered. The exception generating circuit 13 is used to 

sequently be explained with reference to FIG. 1. FIG. 1 generate such exception signal 14 and the exception pro- 
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cessing circuit 15 supplies the operation stop signal 16 FIG. 15, depending on the processing unit number desig- 

depending on this exception signal 14. Thereby, operation of nated in the instruction word for making reference to the 

processing unit is stopped, resulting in the reset waiting predicate register. 

condition- With introduction of this structure, the number of pro- 

The predicate register broadcasting selector 12 s cessing units and ALU can be increased under the condition 

broadcasts, if necessary, the result of operation of each that signal delay due to multiple-input selector and increase 

processing unit based on determination of the mediating of mounting area to independently connect many signal lines 

circuit 11. are minimized. 

The data broadcasted is supplied to the predicate registers Although preferred embodiments of the present invention 

105 and 205 by the predicate register broadcast data sup- 10 have been described and illustrated, it will be apparent to 

plying means 108 and 208. The predicate register broadcast those skilled in the art that various modifications may be 

data supplying device 108 and 208 are formed of the circuit made without departing the principles of the invention, 

for obtaining simple logical sum OR. This circuit is not For example, in above embodiments, it is described that 

always required to be the means for obtaining the OR. For two processing units in maximum are used and two ALU in 

example, it may be the selecting circuit for selectively 15 maximum are also used for one processing unit, but there is 

supplying the broadcast data, when the broadcast data is no restriction for these values and the number of processing 

being supplied or supplying the data output from ALU units and ALU can be increased or decreased as required, 

within the processing unit when such broadcast data is not Moreover, structure of the instruction word and instruc- 

supphed. uon fi e id is described in above embodiments but this struc- 

It is assumed that the predicate registers 105 and 205 have ture itself is not restricted thereto and may be modified 

a plurality of memory regions. At the time of storing a value freely. 

to the predicate register, the predicate register region in £ n above embodiments, it is also described that data 

which the value is to be stored is determined by the value of writing and reading can be made to the predicate register 

operand stored in the instruction registers 101 and 201. In using the particular bus, but it is only for simplification of 

addition, the predicate register from which the value is read explanation. For example, it is of course possible to employ 

at the time of reading the value from the predicate register the structure to transfer the data between the desired data 

is also determined by the value of operand stored in the register and desired predicate register, 

instruction registers 101 and 201. Such information related M described previously, according to the present 

to selection of the predicate register region is stored in the 3o mvC ntion, there is provided a processing unit comprising a 

P field or I field of the instruction register. plurality of processing units to execute in parallel a plurality 

In above embodiments, selection of the predicate register of operations, wherein an instruction format, which can save 

value between the processing units has been realized by the the number of times of the transfer operation of operation 

method for selecting the data to be written into the predicate result between a plurality of processing units, is utilized, 

register. Namely, the predicate register value has been 35 Thereby, the number of steps of the processing program can 

selected by changing the ALU of which output is written into be reduced. 

the predicate register. However, selection of the predicate Furthermore, since the number of signal lines required for 

register value can be realized by using the method for update of the predicate registers between the processing 

selecting the data to be read from the predicate register. units is smaller than the bit width required for transfer of 

An embodiment using this method will be explained with 40 contents of the data register, the circuits may be mounted 

reference to FIG. 14 and FIG. 15. FIG. 15 shows an example within a narrower area and transfer of data between the 

of the instruction word string compiled from the program processing units may be realized at a higher speed, 

shown in FIG. 10 into the format applied to the predicate FIG. 16 shows the information processing system using 

register type processing unit having the function to broad- the present invention as a sub processor for processing 

cast the read value. Here, the processing unit number is 45 multimedia data. In FIG. 16, 20 designates a main processor 

designated in the instruction word indicating the predicate which wholly manages the information processing system, 

register in order to show the predicate register to which 21 designates a main memory for storing main program, 22 

reference should be made. For example, the heading section designates a sub processor which operates according to the 

"(pO: 1)" of the third instruction word of the processing unit present invention as described above with respect to FIGS. 

2 specifies that reference should be made to the predicate 50 1,5,13 and 14, 23 designates a sub memory which the 

register region l(pl) of the processing unit 1. instruction word strings are stored in, 24 designates a 

FIG. 14 shows a schematic structure of the processing unit multimedia I/O circuit for managing multimedia data for the 

of this embodiment to realize the function explained above. sub processor 22, 25 designates a display circuit for dis- 

In this embodiment, there is a structural difference from the playing results or indicating operations, 26 designates an 

embodiment of FIG. 1 in the point that the writing value of 55 audio circuit for inputting/outputting audio data, 27 desig- 

the predicate register is not broadcasted but the read value nates a network circuit for communicating data via network 

from the predicate register is broadcasted. Difference of connected, 28 designates a peripheral circuit for controlling 

structure or operation from the above embodiment will be peripheral devices, 29 designates a keyboard device for 

mainly explained below. inputting an operator's instruction, etc, 30 designates a disk 

The values which are outputted from each ALU for 60 storage device for storing the instruction word strings 

writing into the predicate register are written respectively executed in the sub processor or data used for the program 

into the predicate registers provided in the processing unit. and said main program, 31 designates a bus which infor- 

The values read from the predicate registers are outputted to mation flows on. 

the bus 140 for predicate broadcast. By way of this bus, each According to the system of the present invention, because 

ALU can realize reference to the predicate register value 65 the number of steps can be reduced, it is possible not only 

stored in any processing unit. The value of predicate register to enable high speed processing but to reduce the capacity of 

to be referred is determined, as explained above in regard to the sub memory and the storage device. 
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While the present invention has been described in detail 
and pictorially in the accompanying drawings it is not 
limited to such details since many changes and modifica- 
tions recognizable to those of ordinary skill in the art may be 
made to the invention without departing from the spirit and 
the scope thereof. 

What is claimed is: 

1. A processing apparatus executing a plurality of instruc- 
tion fields in one instruction word in parallel, comprising: 

a plurality of registers provided corresponding to a plu- 
rality of instruction field groups, each instruction field 
group including: 
at least one instruction field, 

a circuit for storing values to a plurality of registers 
based on the result of an operation, and 

an operation circuit for selecting whether an operation 
should be executed based on an evaluation of the 
values stored in said plurality of registers, said 
operation circuit having a register which is indepen- 
dent of the other registers in the other operation 
circuits, each of said independent registers being 
updated with a result of the evaluation. 

2. A processing apparatus according to claim 1, further 
comprising: 

a circuit for selectively storing the values obtained by 
evaluating the result of the operation in the correspond- 
ing instruction field group and the value obtained by 
evaluating the result of the operation of the groups 
other than the corresponding instruction field group 
into said plurality of registers; and 

wherein said plurality of registers are evaluated only by 
the corresponding instruction field groups respectively. 

3. A processing apparatus according to claim 2, wherein 
said circuit for storing values into said plurality of registers 
selects and stores, when the value obtained by evaluating the 
result of operation in the other instruction field group is sent 
into the values being sent and otherwise selects and stores 
the value obtained by evaluating the result of an operation in 
the corresponding instruction field group. 

4. A processing apparatus according to claim 3, wherein 
said instruction field includes an instruction bit string for 
determining whether the value evaluating the result of an 
operation should be sent to said independent register each 
being included in at least one other instruction field group. 

5. A processing apparatus according to claim 1, further 
comprising: 

a circuit for selectively evaluating values stored in said 
register of the corresponding instruction field group and 
values stored in said register of the groups other than 
the corresponding instruction field group; and 

wherein said plurality of registers are evaluated by any 
one instruction field group. 

6. A processing apparatus according to claim 5, 
wherein said circuit for evaluating the values stored in 

said plurality of registers selects and evaluates, when 
the value stored in any one of said registers is broad- 
casted to all of a plurality of instruction field groups, 
the values being broadcasted and otherwise selects and 
evaluates the value stored in said register of the corre- 
sponding instruction field group. 

7. A processing apparatus according to claim 6, wherein 
said instruction field includes an instruction bit string for 
determining whether the value stored in any one of said 
plurality of registers should be sent to said independent 
register each being included in at least one other instruction 
field group. 
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8. A processing apparatus executing a plurality of instruc- 
tion fields in one instruction word in parallel, comprising: 

a plurality of registers provided corresponding to a plu- 
rality of instruction field groups, each instruction field 
group including: 
at least one instruction, 

a circuit for storing values to a plurality of registers 
based on the result of operation, 

an operation circuit for selecting whether operation 
should be executed or not based on evaluation of the 
values stored in said plurality of registers, 

a circuit for selectively storing the values obtained by 
evaluating the result of the operation in the corre- 
sponding instruction field group and the value 
obtained by evaluating the result of the operation of 
the groups other than the corresponding instruction 
field group into said plurality of registers, 

wherein said plurality of registers are evaluated only by 
the corresponding instruction field groups respec- 
tively; 

wherein said circuit for storing values into a plurality of 
registers selects and stores, when the value obtained 
by evaluating the result of operation in the other 
instruction field group is sent into the values being 
sent and otherwise selects and stores the value 
obtained by evaluating the result of operation in the 
corresponding instruction field group; 

a circuit for detecting that the value obtained by evalu- 
ating result of operation is sent out from two or more 
of instruction field groups, and 

a circuit for generating an exception signal based on the 
result of the detection. 

9. A processing apparatus executing a plurality of instruc- 
tion fields in one instruction word in parallel, comprising: 

a plurality of registers provided corresponding to a plu- 
rality of instruction field groups, each instruction field 
group including: 
at least one instruction field, 

a circuit for storing values to a plurality of registers 
based on the result of operation, 

an operation circuit for selecting whether operation 
should be executed or not based on evaluation of the 
values stored in said plurality of registers, 

a circuit for selectively evaluating values stored in the 
register of the corresponding instruction field group 
and values stored in the register of the groups other 
than the corresponding instruction field group, 

wherein said plurality of registers are evaluated by any 
one instruction field group; 

wherein said circuit for evaluating the values stored in 
a plurality of registers selects and evaluates, when 
the value stored in any one of said registers is 
broadcasted to all of a plurality of instruction field 
groups, the values being broadcasted and otherwise 
selects and evaluates the value stored in said register 
of the corresponding instruction field group; 

a circuit for detecting that the value stored in any one 
of said registers is sent out from two or more of 
instruction field groups, and 

a circuit for generating an exception signal based on the 
result of the detection. 

10. A processing apparatus according to claim 9, 
wherein said value obtained by evaluating the result of 

operation is sent by way of a common bus fine. 

11. A processing apparatus executing a plurality of 
instruction fields in one instruction word in parallel, com- 
prising: 
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a plurality of operation units provided corresponding to at 

least one instruction field among said plurality of 

instruction fields, 
each operation unit comprising: 

an instruction register for holding said at least one 
instruction field, 

an operation circuit for executing an operation corre- 
sponding to said at least one instruction field, and 

a register for storing a value used to determine whether 
execution of an operation by said operation circuit is 
to be performed, 

wherein said operation circuit determines whether 
execution of said instruction is to be performed 
depending on the value written into said register, said 
operation circuit having a register which is indepen- 
dent of the other registers in the other operation 
units, each of said independent registers being 
updated with a result of the evaluation. 
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12. A processing apparatus according to claim 11, further 
comprising: 

a circuit for writing the value obtained by evaluating the 
result of an operation of a predetermined instruction to 
the registers in at least one operation unit; and 

wherein said operation circuit determines execution or not 
execution of said instruction for which the register is 
designated. 

13. A processing apparatus according to claim 11, further 
comprising: 

a circuit for writing the value obtained by evaluating the 
result of an operation of a predetermined instruction to 
the register in its own operation unit; and 

wherein said operation circuit determines execution or not 
execution of the instruction for which said register in 
any operation unit is designated. 
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